
## Presentation. Intro Problem. Importance. Data. Pipeline. Technologies.
## Presentation. Insights Trends. Maps. Description, Visualisation, and Visual representation of selected features.

Source: contribution of Detroit, MI to “Police Data Initiative”
Nature: reports from police information management system
Sample:



Import packages, then load data untraditional way, via connection to DB or traditional way, via reading operation from local disk.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datashader as ds
import datashader.transfer_functions as tf
import holoviews as hv
from holoviews.operation.datashader import datashade
from holoviews import opts, dim
hv.extension('matplotlib')
from colorcet import fire
datashade.cmap = fire[50:]
import src.db_connector
PATH_TO_PROC_DATA = 'data\\interim\\'
Load or read data
%%time
LOAD_REQ = False
if LOAD_REQ == True:
df = pd.DataFrame(src.db_connector.get_many_items('crimes', 'detroit')) # return iterator, so need container
else:
df = pd.read_csv(PATH_TO_PROC_DATA + 'RMS_Crime_Incidents_modified.csv', index_col=0, low_memory=False)
# extra preprocess for categories
df['Crime Against'] = df['Crime Against'].astype('category')
df['year'] = df['year'].astype('category')
df['Crime Against codes'] = df['Crime Against'].cat.codes
Numer of reported crimes:
Zip codes of reported offenses
Reported crimes by day and by week distributions
| Per hour | Per weekday |
|---|---|
![]() |
![]() |
SEX related:
General:
To reproduce the numbers:
df.groupby(['year']).crime_id.count().quantile(0.5)df.groupby(['year', 'incident_timestamp_dt_hour']).crime_id.count().median()
df.groupby('incident_timestamp_dt_hour').crime_id.count().plot.bar()
df.groupby('incident_timestamp_dt_day_of_week').crime_id.count().plot.bar()this plots are upper at a filedf[(df.offense_description.str.contains('PROSTITUT'))].groupby('incident_timestamp_dt_day_of_week').crime_id.count().plot.bar()df[df.offense_category.str.contains('SEX OFFENSES')].groupby('incident_timestamp_dt_day_of_week').crime_id.count().plot.bar()df[(df.offense_description.str.contains('PROSTITUT'))].groupby(['X', 'Y']).count().crime_id.sort_values()Let's plot some auxilary statistics plots, which will be applied to general insights.
Some interesting insights:
SEX related:
General:
df[(df.offense_description.str.contains('PROSTITUT'))].groupby('incident_timestamp_dt_day_of_week').crime_id.count().plot.bar()
df[df.offense_category.str.contains('SEX OFFENSES')].groupby('incident_timestamp_dt_day_of_week').crime_id.count().plot.bar()
df[(df.offense_description.str.contains('PROSTITUT'))].groupby(['X', 'Y']).count().crime_id.sort_values()
df[(df.offense_description.str.contains('PROSTITUT'))].groupby(['address']).count().crime_id.sort_values()
Overall offense number per different time-slices.:
Per month of a year
df.groupby('incident_timestamp_dt_month').crime_id.count().plot.bar(ylim=(15000,26000), zorder=2,
figsize=(10, 5))
plt.xlabel('Ordered number of a month, ordered months')
plt.ylabel('Amount of offenses, per period')
plt.grid(alpha=0.5, axis='y', zorder=-1)
# plt.savefig('overall_per_month_detroit_time.png', dip=400)
Per day of a month
df.groupby('incident_timestamp_dt_day_of_month').crime_id.count().plot.bar(ylim=(2000,12000), zorder=2,
figsize=(10, 5))
plt.xlabel('Ordered number of a day, ordered by day of a month')
plt.ylabel('Amount of offenses, per period')
plt.grid(alpha=0.5, axis='y', zorder=-1)
#plt.savefig('overall_per_day_of_month_detroit_time.png', dip=400)
Per day of a week
df.groupby('incident_timestamp_dt_day_of_week').crime_id.count().plot.bar(ylim=(35000,40000), zorder=2,
figsize=(10, 5))
plt.xlabel('Ordered number of a day, Monday=0, Sunday=6')
plt.ylabel('Amount of offenses, per period')
plt.grid(alpha=0.5, axis='y', zorder=-1)
# plt.savefig('overall_per_weekday_detroit_time.png', dip=400)
Per hour of a day
df.groupby(['incident_timestamp_dt_hour']).crime_id.count().plot.bar(zorder=2,
figsize=(10, 5))
plt.xlabel('Hours, ordered number')
plt.ylabel('Amount of offenses, per hour')
plt.grid(alpha=0.5, axis='y', zorder=-1)
# plt.savefig('overall_per_hour_detroit_time.png', dip=400)
We have reported crimes and offenses and its' known geo-locations. So, we may combine and synergize it. We will use several approaches:
* Also, the 'inner parts' are not parts without offenses and crimes, but two historically non-Detroitian municipalities because of the tax reasons. They don't share statistics with our source, and sometimes even hadn't the police

In addition, it required to add and use limits of citi's inner administrative division, as municipalities and districts to gather deeper trends in our statistics.
Let's look at the peak points. Let's look at the geo-distribution. Be familiar with the cities' parts
To see the dynamic version, open the Kepler-based web interface (may spend time depends on the device)
Geospatial distribution, gridded, of reported offenses and crimes per year
fig = hv.Scatter3D((df.X, df.Y, df.year),
kdims=['X', 'Y'], vdims=['year'])
fig.opts(hv.opts.Scatter3D(azimuth=65, elevation=15, s=0.2, alpha=0.3,
fig_size=600, show_legend=True, color='year', cmap='Category20',
fontsize=20))
fig.relabel("Offenses and crimes, overall per period, 2016 - curr.")
# plt.savefig(PATH_TO_IMAGES + 'Offenses and crimes, overall per period, 2016-curr.png', dip=600)
Geospatial distribution, gridded, of reported offenses and crimes per crime category
Crimes and offenses mihg by grouped by its meta-category from the National Incident-Based Reporting System:
print(*df['Crime Against'].cat.categories, sep=', ')
fig = hv.Scatter3D((df.X, df.Y, df['Crime Against codes']),
kdims=['X', 'Y'], vdims=['Crime Against codes'])
fig.opts(hv.opts.Scatter3D(azimuth=70, elevation=20, s=0.2, alpha=0.3, fig_size=600, show_legend=True,
color='Crime Against codes', cmap='fire',
fontsize=20))
fig.relabel("Offenses and crimes, overall per category, per period, 2016 - curr.")
# plt.savefig(PATH_TO_IMAGES + 'Offenses and crimes, overall per category per period, 2016-curr.png', dip=600)
print(*df['Crime Against'].cat.categories, sep=', ')
# https://justinbois.github.io/bootcamp/2019/lessons/l34_holoviews.html
# https://github.com/holoviz/holoviews/issues/1794
# https://justinbois.github.io/bootcamp/2019/lessons/l34_holoviews.html
fig1 = hv.Scatter3D(data=df,
kdims=['X', 'Y'],
vdims=['Crime Against codes', 'year']).groupby(['year'])
fig1.opts(hv.opts.Scatter3D(azimuth=70, elevation=20, s=0.2, alpha=0.3,
fig_size=600, show_legend=True, cmap='fire',
fontsize=20))
fig1.relabel("Offenses and crimes per category per category per year")
(the part below is under reconstruction)
Let's see on the densities of our crimes and offenses
We saw this peaks above, but statistical algorithms may prove or dis-prove our views about places with peaks.
import warnings
warnings.filterwarnings('ignore')
%%time
for cat_ag in sorted(df['Crime Against'].unique(), reverse=True)[0:3]:
try: # becouse the probable oversize of points lets take a sample
df_to_plot = df[df['Crime Against'] == cat_ag].sample(40000) # normal sample to limit computation resources spend
sns.jointplot(df_to_plot[df_to_plot['Crime Against'] == cat_ag]['X'],
df_to_plot[df_to_plot['Crime Against'] == cat_ag]['Y'], kind='kde',
figsize=(20, 20))
except:
sns.jointplot(df[df['Crime Against'] == cat_ag]['X'], df[df['Crime Against'] == cat_ag]['Y'], kind='kde',
figsize=(20, 20))
plt.suptitle(cat_ag);
# plt.savefig(PATH_TO_IMAGES + 'by_category_{}.png'.format(cat_ag), dip=800)
Data is not provided, but the problem is known and exists. Let's try to evaluate the crimes and offenses amount inside the inner but non-accountable parts of a city
Historically, both Hamtramck and Highland Park weren't the parts of Detroit, thus they don't relate to the Detroit Police Departmentб its statistics and even its duty, so we haven't got a precise number of offenses from the two regions inside of a Detroit. We understand that these old borders do not stop crimes.
%%time
for indx, year in enumerate(sorted(df.year.unique(), reverse=True)):
g = sns.jointplot(df[df.year == year]['X'], df[df.year == year]['Y'], kind='kde', figsize=(20, 20)) # aplha=0.005
plt.suptitle(year);
# g.savefig(PATH_TO_IMAGES + 'density_evaluation_{}.png'.format(year), dip=600)
plt.show();
Analytically:
Technically

Аналитически:
Технически